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Previous experiments on the effects of distortion in voice 
communication circuits have shown that intelligibility is 
impaired surprisingly little by the type of amplitude dis- 
tortion known as peak clipping. It has been found, in fact, 
that conversation is possible even over a system that intro- 
duces “‘infinite’”’ peak clipping, i.e., that reduces speech to 
a succession of rectangular waves in which the discon- 
tinuities correspond to the crossings of the time axis in the 
original speech signal. 

The intelligibility of the rectangular speech waves de- 
pends critically upon the frequency-response characteristics 
of the speech transmission circuits used in conjunction with 
the ‘‘infinite clipper.’’ In the present experiments, resist- 
ance-capacitance circuits with sloping frequency-response 
characteristics (tilting circuits) were introduced into the 
system at points preceding and/or following the clipping 
circuit. The interactions of the nonuniform frequency 
characteristics of the resistance-capacitance circuits with 
the nonlinear amplitude characteristic of the clipping 
circuit were studied by means of articulation tests. That 
there was strong interaction is evidenced by the fact that 
word articulation scores of 97 and 15 percent were obtained 
with two systems consisting of the same components in 


INTRODUCTION 


HE waves of normal speech are so irregular 
and so complex that it is difficult to give 
even a statistical description of them. An instru- 
ment capable of transmitting or storing or re- 
producing them faithfully must have a dynamic 
range of 35 db or more, and it must be capable of 
preserving their highly intricate and ever- 
changing patterns. Regarded from one point of 
view, the problem of preserving the variety and 
complexity of speech waves presents a challenge. 
Regarded from another point of view, it prompts 
the question: to what extent can the speech waves 
be simplified without destroying their intelli- 
gibility? 

A method of reducing speech to a simple 
bivariate (on-off) code was suggested by the re- 
sults of experiments conducted during the war to 
determine the effects of overload distortion upon 


* This research is being carried out under contract with 
the U. S. Navy, Office of Naval Research (Contract 
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different orders. The components were (1) a tilter with a 
frequency-response characteristic rising 6 db per octave 
(a “differentiating” circuit), (2) an infinite peak clipper, 
and (3) a tilter with a frequency-response characteristic 
falling 6 db per octave (an “‘integrating’’ circuit). When 
these distorters were cascaded in the sequence 1-2-3 the 
speech output consisted of triangularly shaped waves which 
sounded very much like normal speech and which were 
highly intelligible (97 percent). When the reverse sequence 
(3-2-1) was used, the speech output consisted of sharp 
pulses giving rise to extremely poor quality and very low 
intelligibility (15 percent). 

Tests with single distorters and with pairs of distorters 
indicated that: (1) In the absence of frequency distortion, 
infinitely clipped speech is of poor quality but moderate 
intelligibility (50 to 90 percent depending on the listeners’ 
skill and familiarity with the test words). (2) A differentiator 
or an integrator preceding the clipper determines the degree 
to which intelligibility is impaired by infinite clipping. 
(3) A differentiator or an integrator following the clipper 
(or used alone in a linear system) affects the quality but not 
the intelligibility of the speech transmitted by the system. 


intelligibility. It was found in those experiments 
that, although it suffered in quality and timbre, 
speech remained at least moderately intelligible 
no matter how much peak clipping was intro- 
duced. In its most extreme form, peak clipping 
reduced speech to a series of rectangular waves.! 
These code-like waves were passed through elec- 
tronic switching circuits without further impair- 
ment of intelligibility. Pulses, marking the in- 
stants at which the two-valued rectangular wave 
switched from one amplitude value to the other, 
were transmitted via pulse-modulated carrier, 
detected, and reconverted into intelligible “‘rect- 
angular speech”’ with the aid of an Eccles-Jordan 
trigger circuit. 

It was evident from these observations that a 
considerable amount of information was con- 


1J. C. R. Licklider, Effects of Amplitude Distortion upon 
the Intelligibility of Speech, Psycho-Acoustic Laboratory 
Report OSRD No. 4217, 15 November 1944, PB 19775. 
This report and other Psycho-Acoustic Laboratory reports 
are available through the Office of Technical Services, 
U. S. Department of Commerce, Washington, D. C 
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tained in the temporal pattern of the crossings of 
the time axis in the original speech wave, for this 
temporal pattern was all that remained of the 
original wave after it had passed through the 
clipping and switching circuits. The curious 
feature of this result was not that the information 
could be represented by a series of elementary 
waves or impulses suitably spaced in time—this 
form of representation is the basis of frequency 
modulation, phase modulation, and pulse-time 
modulation techniques—but rather that, without 
resorting to the use of a modulation procedure at 
all, but simply by clipping off the peaks of the 
speech wave until it was reduced to a two-valued 
function of time, we should produce an intelligible 
temporal pattern. The pattern was not, however, 
highly intelligible. Although conversation was 
carried on with little difficulty in rectangular 
speech, articulation scores for discrete words were 
as low as 50 percent. The problem, if infinite 
clipping were to provide a usable bivariate code, 
was to improve intelligibility without sacrificing 
the simplicity of the rectangular wave form. 

The way of achieving this result was suggested 
by experiments on design objectives for hearing 
aids? and for radio transmitters.’ 

In these experiments, it was found that the 
impairment of intelligibility due to severe peak 
clipping was reduced when a circuit with a rising 
frequency-response characteristic (pre-emphasis 
of the high frequency components of speech) was 
introduced ahead of the clipping circuit. Pre- 
emphasis of the low frequency components of 
speech, on the other hand, was found to increase 
the impairment of intelligibility caused by peak 


clipping. 
PLAN AND PROCEDURE OF EXPERIMENT 


The present experiment was planned with two 
related aims: (1) to study the possibility of pro- 
ducing highly intelligible speech patterns through 


*H. Davis, et al., Hearing Aids: An Experimental Study 
of ae Objectives (Harvard University Press, Cambridge, 
1947). 


3K. D. Kryter and M. I. Stein, The Advaniages of 
Clipping the Peaks of Speech Waves Prior to Radio Com- 
munication, Psycho-Acoustic Laboratory Report IC-83, 10 
October 1944, PB 22859. N. B. Gross and J. C. R. Lick- 
lider, Effects of Tilting and Clipping upon the Intelligibility 
of Speech, Psycho-Acoustic Laboratory Report PNR-11, 
15 April 1946, PB 52337. G. A. Miller and S. Mitchell, 
“Effects of distortion on the intelligibility of speech at 
high altitudes,” J. Acous. Soc. Am. 19, 120-125 (1947). 
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the use of frequency-selective circuits in conjunc- 
tion with an infinite peak clipper, and (2) to ex- 
amine the interactions between the types of fre- 
quency distortion produced by two simple re- 
sistance-capacitance circuits and the type of 
amplitude distortion introduced by the infinite 
clipper. 

The resistance-capacitance circuits, which are 
shown in Fig. 1, were driven by low impedance 
sources (cathode followers) and operated into 
high impedance loads (grids). Since in circuit A 
the impedance to current flow is determined 
almost entirely by the reactance of the capacitor, 
the instantaneous current through the circuit 
(hence the instantaneous voltage across the re- 
sistor) is proportional to the rate of change of the 
input voltage. Thus circuit A acts as a ‘‘differ- 
entiator” of the wave form of the input signal. 
Or, to take an alternative view, the effective cur- 
rent through arrangement A (hence the effective 
voltage across the resistor) is proportional to the 
frequency of the input signal, and circuit A 
“tilts’’ the input spectrum upward, introducing 
6 db less attenuation for each octave increase in 
frequency throughout the range of speech fre- 
quencies. In circuit B, the impedance to the flow 
of audio-frequency current is essentially constant 
since it is determined almost entirely by the 
resistance. The instantaneous voltage across the 
condenser is therefore proportional to the amount 
of current that has flowed into the capacitor, and 
thus to the time integral of the input voltage. 
Circuit B acts, therefore, as an “‘integrator’’ of 
the input wave form. Or, again to take an alter- 
native view, the effective voltage across the con- 
denser is inversely proportional to the frequency 
of the input signal, and circuit B gives the 
spectrum a downward tilt of 6 db per octave. 

The clipping circuit consisted of five series- 
diode clippers of the type shown in Fig. 24, 
isolated from each other by conventional. re- 


0.001! yf 10,000 N 
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Fic. 1. The differentiating circuit (A) and the integrating 
circuit (B) used in the articulation tests. 
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Fic. 2. The nonlinear circuits of the infinite peak clipper 
used in the articulation tests. A is the basic clipper circuit 
of the peak-clipping amplifier. The amplifier included 5 of 
these clippers, ced eae by resistance-capacitance coupled 
amplifiers. B is the all-or-nothing switch (Schmidt circuit) 
used to remove any deviations from rectangularity which 
remained in the wave after the repeated clipping. 


sistance-capacitance coupled amplifiers which 
provided a total voltage gain of over 100 db. The 
nearly rectangular output of the cascaded clippers 
was further amplified and passed through the 
Schmidt circuitt shown in Fig. 2B. This circuit, 
which acts as an all-or-nothing switch, served to 
minimize any deviations from ideal rectangularity 
which may have remained in the output of the 
clippers. 

The differentiator, the integrator, and the infi- 
nite clipper were introduced, in various combi- 
nations, into an otherwise high quality system 
described below. Of the 16 cascade arrangements 
of the three circuits, taken none, one, two, or 
three at a time, six arrangements could be elimi- 
nated as effectively duplicating others because 
the operations of differentiation and integration, 
performed in succession, left the speech wave 


40. S. Puckle, Time Bases (John Wiley and Sons, Inc., 
New York, 1944), p. 57. 
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undistorted. The ten remaining arrangements 
were: 


(1) No distortion 

(2) Differentiation (Diff.) 
(3) Integration (Int.) 

(4) Infinite. clipping (Clip.) 
(5) Diff.+Clip. 


(6) Int.+Clip. 
(7) Clip.+ Diff. 
(8) Clip.+ Int. 
(9) Diff.+Clip.+Int. 
(10) Int.+Clip.+ Diff. 


The effect of each of these operations upon sine 
waves and speech waves is illustrated sche- 
matically in Fig. 3. The triangular waves and 
trains of pulses are shown as consisting of straight 
line segments. Actually, since the differentiator 
and the integrator provided close approximations 
of the mathematical operations only in the range 
of audio frequencies, the triangular waves and 
pulses were made up of exponential curves with 
very slight curvature. Extension of the frequency 
ranges of the resistance-capacitance circuits 
would have had little or no effect upon the results 
of the experiment, however, since frequencies 
above 7500 cps were attenuated by the earphones. 

In order to determine the effect upon intelli- 
gibility of each of the 10 arrangements of the 
three distortions, word articulation tests were 
conducted with the distorters introduced into an 
otherwise high-quality audio communication 
system. The word lists were Nos. 1-5 of the 
Psycho-Acoustic Laboratory PB lists (50 mono- 
syllabic words per list).5 Five scramblings of each 
list were recorded by a single talker on acetate 
disks, and these tests were played back through 
a system equalized within +3 db from 100 to 
7000 cps. The microphone was Western Elec- 
tric Type 633-A, and the earphones were 
Permoflux PDR-10’s in sponge-neoprene cush- 
ions (MX /41-AR). Under each of the test condi- 
tions, the sound pressure level of the speech in the 
listeners’ ears was approximately 85 db re 0.0002 
dyne/cm?, rms. 

In all, 250 articulation tests were made: 25 
with each of the 10 arrangements of the dis- 
torting circuits, 10 with each of the five scram- 
blings of each of the five word lists. The tests 
were so arranged that each combination of 
distortions was tested once in each block of 10 
tests and, during the experiment, once with each 
of the 25 recorded lists. Except for these re- 


5 These lists are described by J. P. Egan in Articulation 
Testing Methods II, Psycho-Acoustic Laboratory Report 
OSRD No. 3802, 1 November 1944, PB 22848. 
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strictions, the tests were made in random se- 
quence, the schedule having been set up with the 
aid of a table of random numbers. 


EFFECTS OF THE DISTORTIONS 
ON INTELLIGIBILITY 


The effects of the several combinations of 
distortion upon intelligibility are shown in 
Figs. 4 and 5. In Fig. 4, the successive test scores 
(averages for the five listeners) are presented, and 
smooth curves are shown to indicate the trend of 
the listeners’ improvement with practice. In 
Fig. 5, the smooth curves are brought together in 
a single plot to facilitate intercomparison of the 
effects of the various permutations of differentia- 
tion, integration, and infinite peak clipping. It is 
evident that the ten curves fall into four groups. 

The first group represents the intelligibility 
scores obtained with no distortion, with differ- 
entiation alone, and with integration alone. These 
scores all very near 100 percent, and the differ- 
ences among them are statistically insignificant. 
(A statistical analysis of the data is described in 
an appendix.) This result concerning their in- 
telligibility is in marked contrast to the observa- 
tions concerning their naturalness and timbre. 
Because it greatly emphasizes the fricative conso- 
nants and weakens the low-pitched vowels, 
differentiation makes speech sound overly crisp. 
Integration, on the other hand, emphasizes the 
low-pitched vowels, weakens the consonants, and 
makes speech sound muffled and ‘‘boomy.”’ The 
safety factor of voice communication is such, 
however, that neither differentiation nor integra- 
tion impairs the intelligibility of speech heard in 
quiet. . 

The fourth and fifth curves, which constitute 
the second group in Fig. 5, were obtained (1) with 
infinite clipping preceded by differentiation and 
(2) with infinite peak clipping preceded by 
differentiation and followed by integration. The 
intelligibility scores were over 90 percent, even 
for unpracticed listeners. After becoming familiar 
with the vocabulary and with the effects of 
distortion, the listeners’ missed only about 3 
words in a hundred. (Scores over 90 percent on 
word tests corresponds to essentially perfect re- 
ception of meaningful sentences.) Comparison of 
the two curves in this pair indicates that the 
action of the integrating circuit had essentially no 
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effect upon the intelligibility of the output of the 
infinite peak clipper. If anything, the integrator 
made the speech slightly more intelligible, but 
the effect upon intelligibility was completely 
negligible compared to the effect upon quality. 
Actually, the only feature seriously marring 
the quality of differentiated-clipped-integrated 
speech was the low frequency noise which came 
through with considerable volume between words. 
When this noise was suppressed, it was very hard 
to believe that the speech could possibly have 
been through a series of distortions which, at one 
stage of the process, had removed all trace of the 
original amplitude pattern. 
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Fic. 3. Schematic illustration of the effects of the dis- 
tortions upon sine waves and upon speech waves. 
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_ Fic. 4. Showing the articulation?scores (averages for five 
listeners) for tests with each of the_10 arrangements of the 
distorting circuits. 


The third group of curves in Fig. 5 includes the 
three curves obtained with the combinations in 
which infinite clipping was the initial distortion: 
infinite clipping alone, infinite clipping plus 
differentiation, and infinite clipping plus integra- 
tion. Again it is evident that the process that 
follows clipping has but little effect on intelli- 
gibility, and again it is true that the articulation 
scores fail to reflect differences in quality and 
timbre that are quite striking to the listener. The 
downward tilt of the spectrum provided by the 
integrator makes the effect of infinite clipping 
sound less noticeable without in fact restoring 
any of the cues for recognition which were de- 
stroyed by the clipper. On the other hand, the 
upward tilt introduced by the differentiator made 


TABLE I. Recognition of words from the test vocabulary 
as compared with false recognition of words not used in 
the tests. 


Words actually Words actually 


Certainty of in test not in test 
recognition vocabulary vocabulary 
1 786 29 
2 152 44 
3 87 . 73 
4 225 1104 
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the clipped speech sound even worse without 
eliminating any important cues. It should be 
noted that the initial articulation scores for 
infinite clipping are well within the range ob- 
tained in previous experiments. The marked rise 
to a level of 90 percent word articulation is due 
presumably to the unusual diligence of the 
listening crew and to the fact that the vocabulary 
of test words was limited to 250. 

The fourth and lowermost group of curves in 
Fig. 5 is the pair for which integration preceded 
infinite clipping. Obviously, predistortion of the 
type introduced by the integrating circuit is 
incompatible with severe clipping. Again, the 
effect of the circuit following the clipper is not 
great, but there does appear to have been more 
improvement with practice when the differ- 
entiator was not used. 

Comparison of the four groups of curves indi- 
cates that, although infinite clipping is always 
associated with some impairment of intelligi- 
bility, the amount of impairment depends 
markedly upon the circuit preceding the clipper. 
When that circuit is a differentiator, intelligibility 
is impaired very little. It remains high enough, in 
fact, to be adequate for many communication 
purposes. And when an integrator is used as a 
third circuit following the clipper, even subjective 
quality is reasonably good. 


The Listeners’ Improvement with Practice 


A striking feature of the results is the degree to 
which the articulation scores improved during 
the course of the experiment (cf. Fig. 4). Inasmuch 
as the same recorded word lists were used in 
successive tenths of the experiment, this im- 
provement cannot be the result of increased 
clarity of enunciation on the part of the talker. It 
must be attributed to learning on the part of the 
listeners, who prior to the experiment had had no 
experience with articulation tests. The question 
is, what did the listeners learn? Did they learn 
what words were on the test lists and what words 
were not? Did they learn the sequences in which 
the words occurred in the several scramblings? 
Or did they acquire a general skill that enabled 
them to understand distorted speech, whether or 
not it consisted of words from the test vocabulary ? 

At the end of the experiment, the listeners were 
presented with a series of 500 words: the 250 
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words of the five lists used in the articulation 
tests and 250 words from five other PB lists to 
which they had never been exposed. Words from 
the two sources were called in random sequence. 
The listeners were asked to indicate for each 
word whether or not it had been a part of the 
articulation-test vocabulary. A rating of one 
represented virtual certainty that the word had 
been on one of the test lists used in the main 
experiment; a rating of four, that it had not been. 
It is clear from the ratings, which are summarized 
in Table I, that the listeners had learned to 
recognize the words of the test vocabulary with 
considerable accuracy, though by no means 
infallibly. Ability to identify a word as belonging, 
or as not belonging, to the test vocabulary would 
of course allow the listener to record words 
correctly on the basis of reduced cues. Familiarity 
with the composition of the word lists would thus 
operate to increase the articulation scores in 
much the same way as do the contextual relations 
of connected discourse. 

After the recognition test, the five listeners 
were given a series of 25 supplementary articula- 
tion tests in which the recorded word lists were 
played back just as they were in the main experi- 
ment except that every fifth word was blanked 
out. The listeners were asked to write down the 
words they heard and to fill in the missing words 
from memory, guessing in case of doubt. Ten 
words were blanked out on each of the 25 tests. 
In all, 1250 attempts to recall were made. Only 26 
of the 1250 responses (approximately 2 percent) 
were correct. It would appear, therefore, that 
although the listeners did learn to recognize the 


words on the test vocabulary, they did not learn- 


the sequences of the words in the 25 scramblings 
well enough to affect the articulation scores 
appreciably through direct recall, as distin- 
guished from recognition. 

As a final observation on the listeners’ im- 
provement with practice, a short series of articu- 
lation tests were made with three sets of recorded 
word lists: (1) the words used in the main experi- 
ment in their original sequences, (2) the same 
words in new sequences, and (3) an entirely 
different set of words, also from PB lists, and very 
nearly equal in difficulty to the words used in the 
main experiment. Ten lists from each of three 
sets were read in random sequence. Half the tests 


TABLE II. Results of supplementary articulation tests 
and of comparable parts of the main experiment. (Each 
value of percent word articulation is the average for five 
tests with each of five listeners.) 


Infinite clipping 


_ Plus 
differentiation 


Lists Infinite clipping 

Old words, 

old sequence 95.4 95.8 
Old words, 

new sequence 94.8 95.4 
New words 85.8 86.3 
First five tests of 

main experiment 71.1 12,2 
Over-all average of 

main experiment 85.9 84.7 


were conducted with infinite peak clipping, half 
with infinite peak clipping plus differentiation. 
The results are summarized in Table II. For 
purposes of comparison,.averages for part of the 
main experiment are also included. 

As shown in the table, the scores were only 
about half a percentage unit lower with new 
scramblings of the familiar word lists than they 
were with scramblings which had been heard ten 
times each. This confirms the finding that the 
listeners did not learn the sequences in which the 
words occurred. With new words, however, the 
scores were about 10 percentage units lower than 
they were with the familiar vocabulary. This 
confirms the finding that familiarity with the 
test vocabulary is an important factor. The table 
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Fic. 5. Showing the effects of various combinations of 
differentiation, integration, and infinite peak clipping upon 
intelligibility. The smooth curves are replotted from Fig. 
4 to facilitate intercomparison. The heights of the bars 
of the column diagram (inset) indicate the over-all averages 
for the ten arrangements of the distorting circuits. 
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also shows, however, that the scores with new 
words in the supplementary tests were almost 15 
percentage units higher than the corresponding 
scores were at the beginning of the main experi- 
ment, at which time the word lists used in that 
experiment were new to the listeners. The skill 
developed by the listeners during the tests is thus 
only in part specific to the words of the test 
vocabulary. It is to a considerable extent a 
general skill, an ability to identify words cor- 
rectly despite severe distortion. 

Comparison: of the third and fifth lines of 
Table II shows that the scores obtained in the 
supplementary tests with the new words are 
nearly equal to the averages for the 25 tests of the 
main experiment. This coincidence provides a 
convenient interpretation for the average scores 
of the main experiment. They may be regarded as 
approximate values of the intelligibility of 
unfamiliar distorted words to listeners who have 
had experience with severely distorted speech. 
For inexperienced listeners, the intelligibility of 
unfamiliar words is in general somewhat lower as 
indicated in Figs. 4 and 5. 


DISCUSSION 
Bivariate Code 


It is evident from the results that a differ- 
entiator and a clipper may be used to reduce 
speech to an intelligible bivariate code, and that 
this code can be restored to a reasonably natural- 
sounding reproduction of the original speech 
simply by passing it through an integrating 
circuit. For some applications it might be con- 
venient to replace the rectangular waves with 
pulses (one pulse to indicate each switch of the 
rectangular wave). This substitution and the 
inverse operation required to recover the rect- 
angular wave from the pulses are easily handled 
by simple electronic circuits. However, the sub- 
stitution of pulses for rectangular waves does not 
alter the formal characteristics of the code, which 
can still be thought of as always having one or the 
other of two amplitude values. 

The practical utility of rectangular speech 
waves or their derivatives will of course depend 
to a considerable extent upon the number of 
square waves or pulses per second that are re- 
quired to provide high intelligibility. If it should 
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turn out, for example, that pulse-time modula- 
tion techniques were more efficient in producing 
an intelligible bivariate code than are combina- 
tions of tilting and clipping, then there would be 
little interest in the latter as the basis of a 
practical procedure. But preliminary observa- 
tions suggest that the efficiency of the code ob- 
tained by tilting and clipping is somewhat higher, 
when expressed in terms of intelligibility per 
pulse (or intelligibility per switch of the rect- 
angular wave), than is the efficiency of the codes 
which result from the application of conventional 
pulse modulation procedures. When a typical 
sample of normal speech is subjected to differ- 
entiation and then to infinite peak clipping, about 
1500 rectangular waves, and therefore 3000 
switchings, are produced each second. Pulse 
modulation schemes, on the other hand, usually 
provide at least five pulses for each cycle of the 
highest frequency component to be transmitted, 
thereby requiring about 15,000 pulses per second 
to provide fair reproduction out to 3000 c.p.s. 
There is little doubt that lower repetition fre- 
quencies could be used without destroying in- 
telligibility completely. But the presence of a 
strong component at the repetition frequency of 
the unmodulated pulses reacts against the use of 
the conventional methods of modulating the 
position, duration, or repetition frequency of 
pulses when the average number of pulses is as 
few as 3000 per second. 


Root-Mean-Square Harmonic Distortion 


Previous experiments have suggested that 
there is little or no correlation between the 
impairment of intelligibility due to amplitude 
distortion and the severity of distortion as ex- 
pressed in terms of measurements with sinusoidal 
test signals.* The present results provide sup- 
plementary data on this point. 

In the present experiment, the situation is 
unusually simple because, although there were 
seven arrangements of distorting circuits which 
introduced nonlinear distortion, they gave rise to 
only three different output wave forms (cf. Fig. 3). 
The three arrangements in which the infinite 
clipper was the final circuit gave rise to rect- 
angular waves. The two arrangements in which 


the clipper was followed by the integrating circuit 


6 See reference 1, pp. 43-49. 
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produced triangular waves. The two arrange- 
ments in which the clipper was followed by the 
differentiating circuit gave rise to trains of pulses. 
All that is necessary is (1) to calculate how 
severely a sine wave is distorted when it is con- 
verted into a rectangular wave, a triangular 
wave, or a train of pulses and (2) to correlate the 
calculated indices of distortion with the measures 
of intelligibility provided by the articulation 
tests. This is done in Table IIJ. The values of 
percentage distortion were computed from the 
formula 


(E+E +Ee+---+E,2)) 


100 eee 
(EY+E2+He2+Ee+:--+E,2)t 


in which E, is the amplitude of the nth harmonic. 
The first 49 harmonics of the output wave were 
taken into account. The table shows rather con- 
clusively that distortion measurements made 
with sinusoidal test signals do not get at the im- 
portant factors governing intelligibility. 


Intensity Distribution in Speech as a Factor 
Governing Intelligibility 


Because it eliminates many of the characteristic 
features of the speech wave, infinite peak clipping 
provides a method of determining whether or not 
certain features are important for intelligibility. 
For example, infinite clipping reduces the envel- 
ope of the speech wave to the simplest possible 
form—two parallel straight lines. This is the case 
whether or not one of the tilting circuits is 
introduced ahead of the clipper, and it remains 
the case even if the differentiator (but not the 
integrator) follows the clipper. Inasmuch as 
essentially perfect communication is possible 
with at least one of the circuit arrangements that 
reduces the envelope to this simple form, it is 
evident that the so-called dynamic characteristics 
of speech are not of vital importance for intelli- 
gibility. It is apparently just as well to reproduce 
all the fundamental speech sounds (or what is 
left of them after infinite clipping) at the same 
intensity level as it is to preserve their normal 
intensities. 

This fact poses a problem for any theory of 
intelligibility in which the distribution of speech 
intensities is regarded as a fundamental de- 
terminant of intelligibility. This is not to say that 
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TABLE III. Showing the lack of relation between values 
of percentage harmonic distortion and the articulation 
scores. 


Percentage 
Output distortion Articulation scores 
Triangular waves 11.6 97.4 86.0 
Square waves 42.7 97.9 85.9 14.8 
Pulses 98.0 84.7 14.7 


the intensity distribution of normal speech does 
not influence the intelligibility of normal speech. 
But the variations in intensity from moment to 
moment appear not to be basic cues for the 
recognition of words. 


Noise Between Words | 


One of the characteristics of normal speech is 
that there are intervals between words or phrases 
during which the talker generates no signal. It 
often happens, of course, that these intervals are 
filled with noise, but under these circumstances 
the speech itself is heard against a background of 
noise, and intervals between words and phrases 
are relatively, if not absolutely, quiet. With an 
infinite peak clipper in the communication cir- 
cuit, however, the intervals between words are 
just as full of sound as are the periods occupied 
by the words. Since the infinite clipper acts in 
such a way that the output wave switches 
whenever the input wave crosses the time axis, 
the intervals between words are full of ‘‘rect- 
angular noise.’’ In the articulation tests, this 
noise was due largely to record scratch. When 
records are not used, it is due either to hum from 
the supply lines or to fluctuation noise in the - 
input circuits of the speech amplifier (shot noise 
or resistance noise). 

A listener’s first reaction to the noise between 
words is usually one of amazement that speech 
can be so clear in the presence of such loud noise. 
After a minute or two, however, the listener 
notices that the noise diminishes as soon as the 
speech appears, and thereafter he regards the 
noise simply as a nuisance. In the articulation 
tests, nothing was done to eliminate the noise, 
although clearly in any practical application a 
“squelch” circuit to silence the intervals between 
words would be indicated. The question did arise, 
however: Is intelligibility impaired by the pres- 
ence of rectangular noise between the words of 
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TABLE IV. Mean articulation scores for each of the 
treatments principal experimental variables.* 


A. Distortion 


(1) No distortion...... 99.6 6) PG gy an eieawn 14.8 
(2) Differentiation (D). 99.5 (7) C-D........... 84.7 
(3) Integration (F).... 99.5 SCHL. a ick ca tcates 86.0 
(4) Infinite clipping (C) 85.9 (9) D-C-I......... 97.4 
(5)D26e bodiihoses5. 97.9 (10) I-C-D......... 14.7 
B. Listener 
B 87.0; C 86.5; H 85.6; S 85.2; K 80.0 
C. Black of 10 tests 
(1) 68.2 (6) 80.3 (11) 84.7 (16) 85.7 (21) 89.8 
(2) 77.2 (7) 81.7. (12) 86.9 (17) 88.7 (22) 87.9 
(3) 79.0 (8) 83.1 (13) 86.0 (18) 89.0 (23) 87.7 
(4) 81.3 (9) 80.7. (14) 85.3 (19) 86.1 (24) 89.3 
(5) 80.4 (10) 85.0 (15) 88.2 (20) 89.6 (25) 88.1 


* The values in the table are actually the percentage scores that 
correspond to the means of the transformed scores described in the text. 


rectangular speech? The problem is essentially 
one of the temporal spread of masking. It con- 
cerns the capacity of the listener’s auditory 
system to establish a figure-ground relation be- 
tween components of the acoustic stimulus 
presented in temporal alternation. 

In order to assess the effect of the noise be- 
tween words, it was necessary to find a suitable 
way of suppressing it. This was accomplished by 
making the noise ultrasonic. By introducing a 
20,000-cycle sine wave at an intensity just suff- 
cient to override the background noise, it is 
possible to eliminate all audible noise between 
words. This method is applicable, however, only 
when the speech-to-noise ratio at the input to the 
clipping circuit is quite high. If the intensity of 
the speech is not well above that of the ultrasonic 
tone, there is danger that a spurious effect, a 
“duty-cycle modulation” of ultrasonic rectangu- 
lar waves, will make the rectangular speech 
waves more intelligible than they would be with 
infinite clipping per se.” 

When the ultrasonic tone was made just intense 
enough to override the background noise, switch- 
ing it off and on had a dramatic effect upon the 
apparent quality of the oyerall transmission, but 
there was little or no change in intelligibility. 

7 The high-frequency tone, passing through the infinite 
clipper, tends to give rise to square waves at high repetition 
frequency. Because speech is superimposed upon the tone, 
however, these waves are not square, but are modulated 
in on-off fraction or in duty-cycle. Normal speech thereby 
manages to get through the infinite clipper as a type of 


modulation which requires only a low-pass filter for its 
‘detection.’ 
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TABLE V. Summary of the analysis of variance. 


Source of Mean square 
variation n variation F P 
Distortion (D) 9 237,442 2730 <0.01 
Listener (1) 4 3,596 41.4 <0.01 
Learning (Le) 24 1,850 21.3 <0.01 
Interactions and 
error 1211 86.9 
DXiIt 36 235.6 4.45 <0.01 
Dx Le 216 204.7 3.87 <0.01 
LiX Le 96 71.8 1.36 <0.01 
Triple interaction 
and error 863 52.9 


This result is in line with what would be expected 
on the basis of extrapolation from data on the 
temporal spread of masking® and on the effect of 
cutting off the initial segments of words.® When 
the masked sound alternates with the masking " 
sound, and when the two sounds are equally 
intense, not more than 2 or 3 milliseconds of the 
masked sound are rendered inaudible. Since the 
elimination of a 2- or 3-millisecond segment from 
the beginning of each syllable in a list of un- 
distorted consonant-vowel-consonant syllables 
causes essentially no decrement in articulation, it 
is not unreasonable that the noise between the 
words of infinitely clipped speech should produce 
little or no impairment of intelligibility. © 


APPENDIX 
Statistical Analysis of the Articulation Data 


It is convenient to think of the 1250 scores 
made by the five listeners on the 250 articulation 
tests of the main experiment as forming a three- 
dimensional matrix in which the variables are 
distortion, listener, and learning. The 10 arrange- 
ments of the three distorting circuits are then 
regarded as 10 treatments of the variable, dis- 
tortion. The five listeners are looked upon as five 
treatments of the variable, listener. And the 25 
blocks, of 10 tests each, are thought of as 25 
treatments of the variable, learning, or more pre- 
cisely as 25 treatments of a temporal variable 


8R. L. Miller, ‘“Masking effect of periodic pulses of ” 
vibrations as a function of time and frequency,” J. Acous. 
Soc. Am. 19, 735 (1947). Also, personal communication 
from G. A. Miller, May, 1947, 

9 J. C. Steinberg in Electrical Engineer's Handbook (John 
Wiley and Sons, New York, 1936), Chap. 9, p. 34. 
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that affords an opportunity for the listeners to 
show improvement with practice. The mean 
articulation percentages for the 10 distortions, 
for the 5 listeners, and for the 25 blocks of tests 
are shown in Table IV. Inasmuch as the average 
scores for the five different word lists fell within 
a range of 2.4 percentage units, there is no need to 
complicate the picture by taking account of the 
variation among word lists. 

In order to obtain an indication of the potencies 
of the three variables just mentioned, relative to 
each other and to the ‘error’ fluctuation within 
the experiment, an analysis was made of the 
variations within the matrix of data. Actually, 
since the articulation scores themselves showed 
the heterogeneity of variance that is characteristic 
of percentages, a form of Fisher’s arc sine trans- 
formation!® was applied to each of the 1250 
percentage scores (A) and the analysis of vari- 
ance was made with the transformed scores (J), 


I=50 sin—(A /50—1). 


The results of this analysis are shown in Table V. 

In Table V, the three principal variables and 
their interactions are listed as sources of varia- 
tion, and for each source the number of degrees 
of freedom (m) and the mean square variation per 
degree of freedom are indicated. F is the ratio of 
the mean square variation associated with a 
particular source to the residual mean square 
variation, and P is the probability that there 
would arise as the result of the fluctuations of 
random sampling an Fas high as, or higher than, 
the one actually obtained. It is evident from the 
table that, of the three main variables, distortion 
was the predominant one, and that each of the 
principal variables completely overshadowed the 
interactions and ‘error’ combined. Thus, al- 
though the effects of the 10 distortions were not 
entirely the same for all five listeners (significant 
DXI% interaction), and although the listeners 
improved with practice more with some distor- 
tions than with others (significant DX Le inter- 
action), and although some listeners learned more 
than others (significant Zi Le interaction), these 


10R. A. Fisher, ‘On the dominance ratio,” Proc. Roy. 
Soc., Edinburgh 42, 321-341 (1922). An excellent dis- 
cussion of transformation to homogenize variance is con- 
tained in C. Eisenhart, M. W. Hastay, and W. A. Wallis, 
Selected Techniques of Statistical Analysis (McGraw-Hill 
Book Co., New York, 1947), Chap. 16. 
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qualifications are of minor moment relative to the 
large variations in intelligibility caused by the 
distortions themselves. 

It is of some interest to consider the magnitude 
of the variance designated in the table as ‘‘triple 
interactions and error.’’ Since triple interactions 
tend to be quite small in experiments of this type, 
the value 52.9 can be taken as an estimate of the 
error variance, i.e. the variance that would be 
obtained if a very large number of 50-word 
articulation tests were conducted under ‘‘con- 
stant conditions” with a system that provided, as 
a long-run average, 50-percent word articulation. 
But it is possible, on the simplifying assumption 
that all of the test words are of the same degree of 
difficulty, to compute the parametric value of the 
variance of the percentage scores of such a series 
of tests. This ‘‘true’’ variance is 


o? =4Npq =4(50) (1/2) (1/2) =50 


where JN is the number of words in the test, p is 
the probability that a word will be heard cor- 
rectly, and q is the probability that a word will be 
missed. The parametric value 50 agrees sur- 
prisingly well with the obtained value 52.9. 

As an estimate of the inherent variability in 
the experiment, therefore, we have a variance of 
about 50 square units, or a standard deviation of 
a little over 7 units for a distribution of individual 
scores. 

The units in terms of which this estimate is 
expressed are the units of the transformed scale. 
The relation between the transformed scale and 
the percent word articulation scale is such that 7 
percentage units is an unbiased estimate of the 
standard deviation for articulation scores near 50 
percent, but an overestimate for articulation 
scores near either end of the percentage scale. It 
is convenient, nevertheless, to take 7 units as the 
standard error of a single score, remembering 
that it tends everywhere except at the 50-percent 
point to be a little too high. On this basis, the 
standard error of the mean score for any one of 
the 10 arrangements of the distorting circuits is 
op = (50/125)+=0.63 in percentage units, since 
the mean is based on 125 individual scores. Simi- 
larly, the standard error of an individual listener’s 
mean is estimated by o1;=(50/250)?=0.45, and 
the standard error of the mean score for a set of 
25 tests is estimated by optock = (50/50)! = 1.00. 
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